Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading
نویسندگان
چکیده
The amount of data movement in an application has a direct impact on both its execution time and power consumption. One way to reduce this, is the implementation of locality-aware scheduling algorithms to maximize the reuse of data during the assignment of work to hardware threads. Locality-Driven Code Scheduling (LDCS), an example of such algorithms, groups the tasks that process a common data block as phases of a single coarse-grain construct named super-task, with each phase being fired according to dataflow semantics. LDCS reduces the number of long latency operations by executing all the phases of a super-task with the same hardware thread and by reading and writing the data block from and to main memory only with the first and last phases of the super-task, while the others rely on the presence of the block in the upper levels of the memory hierarchy. This paper analyzes the impact that LDCS can have on the execution time and the power consumption of an application, and presents experimental results performed on two systems: one with a software-managed memory hierarchy and another one with hardware data caches, showing that LDCS can improve the power efficiency of an application up to 72% and by 28% on average, respectively, for weak scaling.
منابع مشابه
Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure
Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on the data locality property for tasks scheduling. The data locality may cause less physical resources utilization in non-virtualized clusters and more power consumption. Virtualized clust...
متن کاملArray Regrouping on CMP with Non-uniform Cache Sharing
Array regrouping enhances program spatial locality by interleaving elements of multiple arrays that tend to be accessed closely. Its effectiveness has been systematically studied for sequential programs running on unicore processors, but not for multithreading programs on modern Chip Multiprocessor (CMP) machines. On one hand, the processor-level parallelism on CMP intensifies memory bandwidth ...
متن کاملSimulation Study of Multithreaded Virtual Processor
This paper proposes the Multithreaded Virtual Processor (MVP) architecture model as a means of integrating the multithreaded programming paradigm and a modern superscalar processor with support for fast context switching and thread scheduling. In order to validate our idea, a simulator was developed using a POSIX compliant Pthreads package and a generic superscalar simulator called SimpleScalar...
متن کاملEffects of Multithreading on Cache Performance
ÐAs the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The...
متن کاملCache-Affinity Scheduling for Fine Grain Multithreading
Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impa...
متن کامل